Skip to content

[Azure Logs] Add dedicated aadgraphactivitylogs data stream#18880

Merged
terrancedejesus merged 26 commits into
mainfrom
enhancement/azure-ad-graph-activitylogs
May 14, 2026
Merged

[Azure Logs] Add dedicated aadgraphactivitylogs data stream#18880
terrancedejesus merged 26 commits into
mainfrom
enhancement/azure-ad-graph-activitylogs

Conversation

@terrancedejesus
Copy link
Copy Markdown
Contributor

@terrancedejesus terrancedejesus commented May 7, 2026

Adds the azure.aadgraphactivitylogs data stream to ingest the AzureADGraphActivityLogs diagnostic category from Microsoft Entra ID, parallel to azure.graphactivitylogs for Microsoft Graph. Without this, AAD Graph events fall through to azure.platformlogs and the AAD-Graph-specific properties survive only inside event.original.

Proposed commit message

azure: add aadgraphactivitylogs data stream

Add a dedicated data stream for the AzureADGraphActivityLogs
diagnostic category from Microsoft Entra ID. Without this,
legacy Azure AD Graph (graph.windows.net) events fall through
to the platformlogs catch-all and lose schema-aware parsing.

The events router maps routing.category ==
"AzureADGraphActivityLogs" to the new dataset. The ingest
pipeline extracts ECS fields: event.action from HTTP method +
URI collection, event.outcome from response status,
event.category [iam, web], and related.user including the
OAuth app_id for client correlation.

Legacy AAD Graph is still actively used by Microsoft first-party
tooling, older line-of-business apps, and adversary tooling
(ROADtools, AzureHound v1, AADInternals). The dedicated dataset
makes these events available for detection rules and dashboards.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practicesN/A, no dashboards added.

Author's Checklist

  • PII redacted from pipeline test fixtures (tenant IDs, user/session/request UUIDs, sign-in activity IDs, internal user-agents). Synthetic test values throughout.
  • elastic-package check passes; elastic-package test pipeline -d aadgraphactivitylogs passes.
  • End-to-end stack test: events POSTed to logs-azure.events-default are correctly rerouted to logs-azure.aadgraphactivitylogs-default with full ECS field extraction.

How to test this PR locally

cd packages/azure
elastic-package check
elastic-package stack up -d -v
elastic-package install
elastic-package test pipeline -d aadgraphactivitylogs

Optional end-to-end:

# Pipe the captured fixtures through the live events router
python3 -c "
import json
with open('data_stream/aadgraphactivitylogs/_dev/test/pipeline/test-aadgraph-activity.log') as f:
    for line in f:
        line = line.strip()
        if line:
            print(json.dumps({'create': {'_index': 'logs-azure.events-default'}}))
            print(json.dumps({'message': line}))
" > /tmp/aadgraph-bulk.ndjson

curl -sk -u user:pass -X POST \
  "https://localhost:9200/_bulk?refresh=wait_for" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @/tmp/aadgraph-bulk.ndjson

curl -sk -u user:pass  \
  "https://localhost:9200/logs-azure.aadgraphactivitylogs-default/_count"
# expected: {"count":4,...}
  1. Open Discover at https://localhost:5601 (user / pass)
  2. data view logs-azure.aadgraphactivitylogs-*
  3. confirm event.action, event.outcome, http.*, url.path, azure.aadgraphactivitylogs.properties.*, and related.user all populate.

Related issues

Screenshots

Pipeline tests passing locally

Screenshot 2026-05-07 at 2 05 54 PM

Discover view of the new dataset with ECS-parsed events
Expanded document showing the full ECS field tree

Screenshot 2026-05-07 at 2 07 25 PM Screenshot 2026-05-07 at 2 09 42 PM

terrancedejesus and others added 2 commits May 7, 2026 14:23
Adds the azure.aadgraphactivitylogs data stream to ingest the
AzureADGraphActivityLogs diagnostic category from Microsoft Entra ID,
parallel to azure.graphactivitylogs for Microsoft Graph. Without this,
AAD Graph events fall through to azure.platformlogs and the
AAD-Graph-specific properties survive only inside event.original.
Comment thread packages/azure/changelog.yml Outdated
Comment thread packages/azure/manifest.yml Outdated
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the proposed commit message so that it's something that can be used in context of git; no Markdown, appropriately wrapped etc.

For example based on the current code here (update as needed):

azure: add aadgraphactivitylogs data stream

Add a dedicated data stream for the AzureADGraphActivityLogs
diagnostic category from Microsoft Entra ID. Without this,
legacy Azure AD Graph (graph.windows.net) events fall through
to the platformlogs catch-all and lose schema-aware parsing.

The events router maps routing.category ==
"AzureADGraphActivityLogs" to the new dataset. The ingest
pipeline extracts ECS fields: event.action from HTTP method +
URI collection, event.outcome from response status,
event.category [iam, web], and related.user including the
OAuth app_id for client correlation.

Legacy AAD Graph is still actively used by Microsoft first-party
tooling, older line-of-business apps, and adversary tooling
(ROADtools, AzureHound v1, AADInternals). The dedicated dataset
makes these events available for detection rules and dashboards.

@terrancedejesus
Copy link
Copy Markdown
Contributor Author

terrancedejesus commented May 8, 2026

@efd6 proposed commit message updated. Thank you!
Any specific labeling or additional checks?

@efd6
Copy link
Copy Markdown
Contributor

efd6 commented May 8, 2026

The build is complaining:

Error: error validating packages in directory 'packages': error checking data streams from 'packages/azure': package "packages/azure" shares ownership across data streams but these ones [packages/azure/data_stream/aadgraphactivitylogs] lack owners

I think you will need to add a line before this. Who will be the owner of this data stream?

@terrancedejesus
Copy link
Copy Markdown
Contributor Author

terrancedejesus commented May 8, 2026

Who will be the owner of this data stream?

Yes, I noticed the buildkite fail related to owners. There are a few owners across the Azure package data streams it seems so I am not sure what team should be the owner/maintainer? I assume since these are the legacy data stream for Microsoft Graph and it was provisioned to write threat detection rules on, we mirror that so @elastic/security-service-integrations?

@github-actions

This comment has been minimized.

@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

elastic-vault-github-plugin-prod Bot commented May 8, 2026

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

Comment thread packages/azure/data_stream/aadgraphactivitylogs/agent/stream/log.yml.hbs Outdated
Comment thread packages/azure/changelog.yml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pipeline does match about half the other pipelines with the same name:

find -name 'azure-shared-pipeline.yml' | xargs md5sum | sort
28624170d9ba87d593c9aef7dd72284a  ./data_stream/application_gateway/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
28624170d9ba87d593c9aef7dd72284a  ./data_stream/firewall_logs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
352b4a0232fcf818b45a174958b161e8  ./data_stream/identity_protection/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
352b4a0232fcf818b45a174958b161e8  ./data_stream/provisioning/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
8fb30aa0822189b990f17ba026aeb928  ./data_stream/platformlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/aadgraphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/activitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/auditlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/eventhub/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/graphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/signinlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/springcloudlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml

It's a good time to use the new links functionality, at least for the ones that do still match.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added per recommendations: 7be1278fcf4bcd9e2723782c95c43f5a55f45a2d

Note, I did not adjust all of them that could be as that seems out of scope for this. Maybe a separate issue/pr for this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a step in the right direction. I think it would be good to move that version of the file to the top level and change the identical copies to links. Could be done in this PR.

# place the shared copy at the top level
mkdir -p ./_dev/shared/
mv ./data_stream/graphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml ./_dev/shared/

# update link to the shared copy
echo "../../../../_dev/shared/azure-shared-pipeline.yml d9410a8b01f785a2e328560cc5d9a2286c8ca5b4e6ab5ec0edfaba385ddb0fb9" > \
  ./data_stream/aadgraphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link

# remove other hard copies
rm ./data_stream/activitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
rm ./data_stream/auditlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
rm ./data_stream/eventhub/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
rm ./data_stream/signinlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
rm ./data_stream/springcloudlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml

# add other links to the shared copy
cp ./data_stream/aadgraphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link ./data_stream/activitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link
cp ./data_stream/aadgraphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link ./data_stream/auditlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link
cp ./data_stream/aadgraphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link ./data_stream/eventhub/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link
cp ./data_stream/aadgraphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link ./data_stream/graphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link
cp ./data_stream/aadgraphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link ./data_stream/signinlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link
cp ./data_stream/aadgraphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link ./data_stream/springcloudlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml.link

Comment thread packages/azure/data_stream/aadgraphactivitylogs/fields/fields.yml
Comment thread packages/azure/data_stream/events/routing_rules.yml
@github-actions
Copy link
Copy Markdown
Contributor

TL;DR

Buildkite failed before tests ran because the PR checkout hook hit a GitHub remote 500 while fetching the target branch, so this is an infrastructure/transient fetch failure rather than a code regression in this PR. Re-run the build first; if it repeats, add retry/backoff around the target-branch git fetch in the hook.

Remediation

  • Re-run Buildkite build #42650 (same commit 30046ce36184e5916a2b5f69455b6c5b0fe976f6).
  • If this recurs, harden .buildkite/hooks/post-checkout by retrying git fetch -v origin "${target_branch}" (line 28) with bounded retries/backoff, then re-run Get reference from target branch.
Investigation details

Root Cause

The failing step is Get reference from target branch, which executes the repository post-checkout hook. The hook fetches the PR base branch in .buildkite/hooks/post-checkout and exited non-zero when GitHub returned HTTP 500 during git fetch.

Relevant code path:

  • .buildkite/hooks/post-checkout:28git fetch -v origin "${target_branch}"
  • .buildkite/hooks/post-checkout:70 invokes checkout_merge, so this fetch is required for all PR builds.

The PR changes are focused on Azure package files/CODEOWNERS and do not modify Buildkite hooks, which supports this being infra/transient rather than a PR logic/config bug.

Evidence

remote: Internal Server Error
fatal: unable to access 'https://github.com/elastic/integrations.git/': The requested URL returned error: 500
🚨 Error: running "repository post-checkout" shell hook: The repository post-checkout hook exited with status 128

Verification

  • Not run locally (failure is in CI checkout/bootstrap phase before package checks/tests).

Follow-up

If the retry succeeds, no PR code change is needed. If repeated 500s continue across builds, treat as persistent CI infrastructure issue and add fetch retry logic in the hook.


What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

✅ Vale Linting Results

No issues found on modified lines!


The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

@andrewkroh andrewkroh added the documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. label May 11, 2026
@terrancedejesus
Copy link
Copy Markdown
Contributor Author

@chrisberkhout - Thank you again for the feedback and guidance. All comments have been addressed. For each comment, I added the commit for easier reference. The latest buildkit build has succeeded.

Additionally, I've rebuilt locally a new stack and ingested AAD Graph Activity Logs to verify logs are as expected and integration updates are accurate. Please let me know if if there is anything else to add/adjust.

@terrancedejesus terrancedejesus marked this pull request as ready for review May 11, 2026 16:47
@terrancedejesus terrancedejesus requested review from a team as code owners May 11, 2026 16:47
@terrancedejesus terrancedejesus self-assigned this May 11, 2026
@andrewkroh andrewkroh added the Team:obs-ds-hosted-services Observability Hosted Services team [elastic/obs-ds-hosted-services] label May 11, 2026
Copy link
Copy Markdown
Contributor

@zmoog zmoog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few nits.

tag: pipeline-azure-shared-pipeline
- fingerprint:
fields:
- azure.aadgraphactivitylogs.properties.request_uri
Copy link
Copy Markdown
Contributor

@zmoog zmoog May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline removes the azure.aadgraphactivitylogs.properties.request_uri field a few lines before if url.original is not null. Should we keep it or replace this with a different field?

Copy link
Copy Markdown
Contributor Author

@terrancedejesus terrancedejesus May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmoog Good catch and thank you for all the review! If I understand correctly, the remove a few lines up was stripping properties.request_uri before the fingerprint got to use it, so that input was silently dropping out (ignore_missing: true masked it). The hash was effectively only using http.request.id, azure.tenant_id, and properties.time_generated and confirmed this against the data.

Swapped it to use url.original instead, which holds the same value (the uri_parts step keeps a copy via keep_original: true) and is still around at fingerprint time. Re-streamed live events through twice end-to-end and confirmed the _ids are stable now.

fix ref: 8fae9c52502f6fa6d3153d78ea16fa265dc37851

Copy link
Copy Markdown
Contributor

@muthu-mps muthu-mps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code owner approval!

Copy link
Copy Markdown
Contributor

@chrisberkhout chrisberkhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good.

I gave a script to adopt links for the other uses of the same version of the ingest pipeline.

@terrancedejesus terrancedejesus requested a review from efd6 May 13, 2026 19:12
@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

History

cc @terrancedejesus

@terrancedejesus terrancedejesus merged commit ce28db2 into main May 14, 2026
12 checks passed
@terrancedejesus terrancedejesus deleted the enhancement/azure-ad-graph-activitylogs branch May 14, 2026 14:03
@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

Package azure - 1.37.0 containing this change is available at https://epr.elastic.co/package/azure/1.37.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. Integration:azure Azure Logs Team:obs-ds-hosted-services Observability Hosted Services team [elastic/obs-ds-hosted-services]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Azure]: Extend Support to AAD Graph Activity Logs

7 participants